Methodologies for Spatially Interpolating Climate Adaptation Options from Econometrics Methods

Author

Maxwell Mkondiwa

1 Introduction

This notebook provides machine learning and spatial methods for assessing the impacts of climate on yields and identifying adaptation options in a spatially explicit manner. The goal is to generate spatially differentiated (gridded) estimates of the effects of weather extremes and adaptation options.

We use publicly available datasets in South Asia to demonstrate these models.

The code is divided into 4 major components. These include:

  1. Important geospatial packages

  2. Estimation approaches that provided individual farmer level estimates using models that allow nonlinearity of temperature effects and adaptation.

  3. Interpolation of point geocoded data using proximity polygons, nearest neigbor, inverse distance weighted, kriging, Random forest, spatial Bayesian gaussian models and spatial Bayesian geoadditive models.

  4. Interpolation and downscaling based on areal data and other methods

The key references for the estimation models are based on the R packages which include:

  • GRF (Generalized Random Forest) for causal ML

  • spBayes for Bayesian gaussian process models

  • BAMLSS for Bayesian geoadditive models

2 Important geospatial R packages: Terra, geodata,sf, sp

library(geodata)

India=gadm(country="IND", level=2, path="shp")
plot(India)

India_aoi=subset(India,India$NAME_1=="Bihar"|India$NAME_2%in%c("Ballia","Chandauli","Deoria","Ghazipur","Kushinagar","Maharajganj","Mau","Siddharth Nagar","Gorakhpur"))

plot(India_aoi)

plot(India_aoi, add=TRUE)

library(sf)

India_aoi_sf=st_as_sf(India_aoi)
library(mapview)

mapview(India_aoi_sf)
# Dissolve the district polygons to form new polygon of Bihar and EUP
library(sf)
India_aoi_sf_dis=st_union(India_aoi_sf)
mapview(India_aoi_sf_dis)
# From sf to sp
India_aoi_sf_dis_sp=as_Spatial(India_aoi_sf_dis)
# Mapping population density
# population=population(2015,05,path="rasters")
# 
# library(raster)
# pop_raster=raster(population)
# Pop_Bihar_EUP_cropped = crop(pop_raster,India_aoi_sf_dis_sp)
# 
# Pop_Bihar_EUP_cropped_m = mask(Pop_Bihar_EUP_cropped,India_aoi_sf_dis_sp)
# 
# library(Pop_Bihar_EUP_cropped_m)

3 Estimation Approaches

3.1 Causal Random Forest Model to Get Individual Treatment Effects

3.1.1 Sowing dates before Nov 20 vs after

library(grf)
library(policytree)

library(rio)
LDS=import("LDS_wheat_public_cleaned.csv")
monthlytemp=import("monthly_temp.csv")

LDS=cbind(LDS,monthlytemp)

LDSestim_sow=subset(LDS, select=c("Sowing_Date_Early","Sowing_Date_Schedule_rating_num","Sowing_Date_Schedule","L.tonPerHectare","I.q5505_weedSeverity_num","I.q5509_diseaseSeverity_num","I.q5506_insectSeverity_num","I.q5502_droughtSeverity_num",                                       "Nperha","P2O5perha","variety_type_NMWV","G.q5305_irrigTimes","A.q111_fGenderdum","Weedmanaged","temp","precip","wc2.1_30s_elev","M.q708_marketDistance","nitrogen_0.5cm","sand_0.5cm", "soc_5.15cm","O.largestPlotGPS.Latitude","O.largestPlotGPS.Longitude","april_mean_temp","march_mean_temp","feb_mean_temp"))

library(tidyr)
LDSestim_sow=LDSestim_sow %>% drop_na()


Y_cf_sowing=as.vector(LDSestim_sow$L.tonPerHectare)

## Causal random forest -----------------

X_cf_sowing=subset(LDSestim_sow, select=c("I.q5505_weedSeverity_num","I.q5509_diseaseSeverity_num","I.q5506_insectSeverity_num",
                                                  "Nperha","P2O5perha","variety_type_NMWV","G.q5305_irrigTimes","A.q111_fGenderdum","Weedmanaged","temp","precip","wc2.1_30s_elev",
                                                       "M.q708_marketDistance","nitrogen_0.5cm","sand_0.5cm", "soc_5.15cm","O.largestPlotGPS.Latitude","O.largestPlotGPS.Longitude","april_mean_temp","march_mean_temp","feb_mean_temp"))


W_cf_sowing <- as.factor(LDSestim_sow$Sowing_Date_Schedule)

# Probability random forest to create weights
W.multi_sowing.forest <- probability_forest(X_cf_sowing, W_cf_sowing,
  equalize.cluster.weights = FALSE,
  seed = 2
)
W.hat.multi.all_sowing <- predict(W.multi_sowing.forest, estimate.variance = TRUE)$predictions

# Regression forest to get expected responses 
Y.multi_sowing.forest <- regression_forest(X_cf_sowing, Y_cf_sowing,
  equalize.cluster.weights = FALSE,
  seed = 2
)

print(Y.multi_sowing.forest)
GRF forest object of type regression_forest 
Number of trees: 2000 
Number of training samples: 7562 
Variable importance: 
    1     2     3     4     5     6     7     8     9    10    11    12    13 
0.000 0.000 0.001 0.022 0.012 0.647 0.195 0.000 0.066 0.001 0.001 0.005 0.001 
   14    15    16    17    18    19    20    21 
0.001 0.004 0.002 0.024 0.015 0.001 0.001 0.002 
varimp.multi_sowing <- variable_importance(Y.multi_sowing.forest)
Y.hat.multi.all_sowing <- predict(Y.multi_sowing.forest, estimate.variance = TRUE)$predictions

# Fit multi-arm causal RF model
multi_sowing.forest <- multi_arm_causal_forest(X = X_cf_sowing, Y = Y_cf_sowing, W = W_cf_sowing ,W.hat=W.hat.multi.all_sowing,Y.hat=Y.hat.multi.all_sowing,seed=2) 

varimp.multi_sowing_cf <- variable_importance(multi_sowing.forest)

# Average treatment effects
multi_sowing_ate=average_treatment_effect(multi_sowing.forest, method="AIPW")
multi_sowing_ate
                      estimate    std.err            contrast outcome
T2_20Nov - T1_10Nov -0.1448949 0.03820657 T2_20Nov - T1_10Nov     Y.1
T3_30Nov - T1_10Nov -0.2882908 0.03637854 T3_30Nov - T1_10Nov     Y.1
T4_15Dec - T1_10Nov -0.4795005 0.03838067 T4_15Dec - T1_10Nov     Y.1
T5_16Dec - T1_10Nov -0.7188928 0.04067701 T5_16Dec - T1_10Nov     Y.1
 # Calibration check: Multi-arm causal RF does not yet calibration check
 ## We use binary causal RF to do that
 
W_cf_sowing_binary=as.vector(LDSestim_sow$Sowing_Date_Early) 

# Probability random forest to create weights
W.multi_sowing.forest_binary <- regression_forest(X_cf_sowing, W_cf_sowing_binary,
  equalize.cluster.weights = FALSE,
  seed = 2
)
W.hat.multi.all_sowing_binary <- predict(W.multi_sowing.forest_binary, estimate.variance = TRUE)$predictions


# Regression forest to get expected responses 
Y.multi_sowing.forest_binary <- regression_forest(X_cf_sowing, Y_cf_sowing,
  equalize.cluster.weights = FALSE,
  seed = 2
)

print(Y.multi_sowing.forest_binary)
GRF forest object of type regression_forest 
Number of trees: 2000 
Number of training samples: 7562 
Variable importance: 
    1     2     3     4     5     6     7     8     9    10    11    12    13 
0.000 0.000 0.001 0.022 0.012 0.647 0.195 0.000 0.066 0.001 0.001 0.005 0.001 
   14    15    16    17    18    19    20    21 
0.001 0.004 0.002 0.024 0.015 0.001 0.001 0.002 
varimp.multi_sowing_binary <- variable_importance(Y.multi_sowing.forest_binary)
Y.hat.multi.all_sowing_binary <- predict(Y.multi_sowing.forest_binary, estimate.variance = TRUE)$predictions

# Fit binary causal RF model
multi_sowing.forest_binary <- causal_forest(X = X_cf_sowing, Y = Y_cf_sowing, W = W_cf_sowing_binary ,W.hat=W.hat.multi.all_sowing_binary,Y.hat=Y.hat.multi.all_sowing_binary,seed=2) 

varimp.multi_sowing_cf_binary <- variable_importance(multi_sowing.forest_binary)

# Average treatment effects
multi_sowing_ate_binary=average_treatment_effect(multi_sowing.forest_binary,target.sample = "overlap")
multi_sowing_ate_binary
  estimate    std.err 
0.22488474 0.01897671 
multi_sowing_binary_calibration=test_calibration(multi_sowing.forest_binary)
multi_sowing_binary_calibration

Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:

                               Estimate Std. Error t value    Pr(>t)    
mean.forest.prediction         1.069524   0.084024 12.7288 < 2.2e-16 ***
differential.forest.prediction 1.728729   0.280558  6.1617 3.782e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tau.hat_sowing=predict(multi_sowing.forest_binary , target.sample = "all",estimate.variance=TRUE)
summary(tau.hat_sowing$predict)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.01807  0.18037  0.23215  0.22897  0.27845  0.42748 

3.1.2 Understanding Mechanisms

We can use the model to understand how farmers who used different amounts of fertilizer or inputs benefit from early sowing of wheat. In addition, we can investigate if farmers who are above the heat stress threshold gain from early sowing.

X_cf_sowingtau=data.frame(X_cf_sowing,tau.hat_sowing)

library(ggplot2)
ggplot(X_cf_sowingtau,
       aes(x = predictions)) +
  geom_histogram() +
  xlab('CATE') +
  geom_vline(xintercept = 0, col = 'black', linetype = 'dashed') +
  geom_vline(xintercept = multi_sowing_ate_binary["estimate"], col = 'red') +
  theme_bw()

# nitrogen
library(ggplot2)
sowingCATENitrogen=ggplot(X_cf_sowingtau,aes(Nperha,predictions))+
  geom_smooth(method="loess",formula=y~x,col="darkblue")+
  labs(x="Applied nitrogen (kg/ha)",y="Sowing treatment effect")
previous_theme <- theme_set(theme_bw())
sowingCATENitrogen

sowingCATEtemp=ggplot(X_cf_sowingtau,aes(temp,predictions))+
  geom_smooth(method="loess",formula=y~x,col="darkblue")+
  labs(x="Temperature",y="Sowing treatment effect")
previous_theme <- theme_set(theme_bw())
sowingCATEtemp

sowingCATEtempFeb=ggplot(X_cf_sowingtau,aes(feb_mean_temp,predictions))+
  geom_smooth(method="loess",formula=y~x,col="darkblue")+
  labs(x="February temperature",y="Sowing treatment effect")
previous_theme <- theme_set(theme_bw())
sowingCATEtempFeb

sowingCATEtempMarch=ggplot(X_cf_sowingtau,aes(march_mean_temp,predictions))+
  geom_smooth(method="loess",formula=y~x,col="darkblue")+
  labs(x="March temperature",y="Sowing treatment effect")
previous_theme <- theme_set(theme_bw())
sowingCATEtempMarch

sowingCATEtempAp=ggplot(X_cf_sowingtau,aes(april_mean_temp,predictions))+
  geom_smooth(method="loess",formula=y~x,col="darkblue")+
  labs(x="April temperature",y="Sowing treatment effect")
previous_theme <- theme_set(theme_bw())
sowingCATEtempAp

sowingCATEprecip=ggplot(X_cf_sowingtau,aes(precip,predictions))+
  geom_smooth(method="loess",formula=y~x,col="darkblue")+
  labs(x="Precipitation",y="Sowing treatment effect")
previous_theme <- theme_set(theme_bw())
sowingCATEprecip

# Mapping
library(sp)
X_cf_sowingtau_sp= SpatialPointsDataFrame(cbind(X_cf_sowingtau$O.largestPlotGPS.Longitude,X_cf_sowingtau$O.largestPlotGPS.Latitude),data=X_cf_sowingtau,proj4string=CRS("+proj=longlat +datum=WGS84"))


library(mapview)
mapviewOptions(fgb = FALSE)
tau.hat_sowing_predictionsmapview=mapview(X_cf_sowingtau_sp,zcol="predictions",layer.name="Early sowing yield gain (t/ha)")
tau.hat_sowing_predictionsmapview
# Based on classifications of treatment gains

library(gtools)
X_cf_sowingtau$sowing_treat_bin <- quantcut(X_cf_sowingtau$predictions)
table(X_cf_sowingtau$sowing_treat_bin)

[-0.0181,0.18]   (0.18,0.232]  (0.232,0.278]  (0.278,0.427] 
          1891           1890           1890           1891 
#Represent it
p <- X_cf_sowingtau %>%
  ggplot( aes(x=temp, fill=sowing_treat_bin)) +
    geom_boxplot()
p

The above analysis however shows only the benefits to those farmers who experienced that level of stress. What about those who didn’t experience it but could have experienced it? For a counterfactual analysis of how each farmer would have gained or lost if they had a terminal stress and had sown early assuming the same levels of use other inputs, we predict the treatment effect under the assumption that all variables are the same except for the maximum temperature which is fixed at a value above 31 (e.g., 32).

3.1.3 Early sowing gains in heat stress of 32

X_cf_sowing_pred=X_cf_sowing
X_cf_sowing_pred$temp=32

tau.hat_sowing_heat=predict(multi_sowing.forest_binary,X_cf_sowing_pred,estimate.variance=TRUE)
summary(tau.hat_sowing_heat$predict)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.02099 0.19525 0.24553 0.24207 0.28682 0.44038 
X_cf_sowingtau=data.frame(X_cf_sowing,tau.hat_sowing)

3.2 Alternative estimates using Bayesian geoadditive model

library(bamlss)
set.seed(111)
f <-  L.tonPerHectare~I.q5505_weedSeverity_num+I.q5509_diseaseSeverity_num+I.q5506_insectSeverity_num+                                            Nperha+P2O5perha+variety_type_NMWV+G.q5305_irrigTimes+A.q111_fGenderdum+Weedmanaged+temp+precip+wc2.1_30s_elev+                                                     M.q708_marketDistance+nitrogen_0.5cm+sand_0.5cm+soc_5.15cm+s(feb_mean_temp)+ s(march_mean_temp)+s(march_mean_temp)+s(april_mean_temp)+ s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early)

## estimate model.
b <- bamlss(f, data = LDSestim_sow)
AICc 29553.61 logPost -14941.8 logLik -14712.1 edf 64.137 eps 0.8139 iteration   1
AICc 22123.20 logPost -11175.0 logLik -11026.1 edf 35.307 eps 0.6797 iteration   2
AICc 17521.03 logPost -8837.69 logLik -8725.31 edf 35.032 eps 33.438 iteration   3
AICc 15673.58 logPost -7908.08 logLik -7791.96 edf 44.552 eps 0.5045 iteration   4
AICc 15387.00 logPost -7769.64 logLik -7639.88 edf 53.230 eps 0.1507 iteration   5
AICc 15369.86 logPost -7767.91 logLik -7629.67 edf 54.846 eps 0.0220 iteration   6
AICc 15365.97 logPost -7914.57 logLik -7627.48 edf 55.093 eps 0.0016 iteration   7
AICc 15364.21 logPost -9319.89 logLik -7626.58 edf 55.107 eps 0.0007 iteration   8
AICc 15363.14 logPost -16228.9 logLik -7626.09 edf 55.064 eps 0.0004 iteration   9
AICc 15362.55 logPost -16227.9 logLik -7625.82 edf 55.043 eps 0.0003 iteration  10
AICc 15362.17 logPost -16226.7 logLik -7625.66 edf 55.015 eps 0.0002 iteration  11
AICc 15361.92 logPost -16225.5 logLik -7625.56 edf 54.985 eps 0.0002 iteration  12
AICc 15361.76 logPost -16224.2 logLik -7625.51 edf 54.955 eps 0.0001 iteration  13
AICc 15361.63 logPost -16222.7 logLik -7625.47 edf 54.930 eps 0.0001 iteration  14
AICc 15361.61 logPost -16222.4 logLik -7625.47 edf 54.924 eps 0.0000 iteration  15
AICc 15361.61 logPost -16222.4 logLik -7625.47 edf 54.924 eps 0.0000 iteration  15
elapsed time:  5.73sec
Starting the sampler...

|                    |   0% 54.74sec
|*                   |   5% 52.63sec  2.77sec
|**                  |  10% 50.40sec  5.60sec
|***                 |  15% 48.11sec  8.49sec
|****                |  20% 46.20sec 11.55sec
|*****               |  25% 43.74sec 14.58sec
|******              |  30% 41.79sec 17.91sec
|*******             |  35% 39.41sec 21.22sec
|********            |  40% 36.50sec 24.33sec
|*********           |  45% 33.43sec 27.35sec
|**********          |  50% 30.40sec 30.40sec
|***********         |  55% 27.58sec 33.71sec
|************        |  60% 24.63sec 36.94sec
|*************       |  65% 21.59sec 40.10sec
|**************      |  70% 18.54sec 43.27sec
|***************     |  75% 15.51sec 46.54sec
|****************    |  80% 12.41sec 49.66sec
|*****************   |  85%  9.34sec 52.91sec
|******************  |  90%  6.22sec 56.02sec
|******************* |  95%  3.12sec 59.30sec
|********************| 100%  0.00sec  1.04min
summary(b)

Call:
bamlss(formula = f, data = LDSestim_sow)
---
Family: gaussian 
Link function: mu = identity, sigma = log
*---
Formula mu:
---
L.tonPerHectare ~ I.q5505_weedSeverity_num + I.q5509_diseaseSeverity_num + 
    I.q5506_insectSeverity_num + Nperha + P2O5perha + variety_type_NMWV + 
    G.q5305_irrigTimes + A.q111_fGenderdum + Weedmanaged + temp + 
    precip + wc2.1_30s_elev + M.q708_marketDistance + nitrogen_0.5cm + 
    sand_0.5cm + soc_5.15cm + s(feb_mean_temp) + s(march_mean_temp) + 
    s(march_mean_temp) + s(april_mean_temp) + s(O.largestPlotGPS.Longitude, 
    O.largestPlotGPS.Latitude, by = Sowing_Date_Early)
-
Parametric coefficients:
                                  Mean       2.5%        50%      97.5%
(Intercept)                  2.498e+00  7.176e-01  2.482e+00  4.284e+00
I.q5505_weedSeverity_num    -7.480e-02 -9.621e-02 -7.499e-02 -5.514e-02
I.q5509_diseaseSeverity_num -1.750e-02 -4.325e-02 -1.778e-02  7.704e-03
I.q5506_insectSeverity_num   1.275e-02 -1.292e-02  1.285e-02  3.689e-02
Nperha                       1.950e-03  1.491e-03  1.944e-03  2.413e-03
P2O5perha                    2.302e-03  1.432e-03  2.308e-03  3.193e-03
variety_type_NMWV            4.062e-01  3.701e-01  4.057e-01  4.416e-01
G.q5305_irrigTimes           3.628e-01  3.399e-01  3.629e-01  3.836e-01
A.q111_fGenderdum           -5.990e-02 -1.473e-01 -6.021e-02  3.105e-02
Weedmanaged                  2.929e-01  2.543e-01  2.930e-01  3.301e-01
temp                        -4.812e-02 -1.159e-01 -4.767e-02  1.776e-02
precip                       1.010e-05 -1.305e-04  1.478e-05  1.277e-04
wc2.1_30s_elev              -7.424e-04 -1.611e-03 -7.570e-04  8.498e-05
M.q708_marketDistance       -7.562e-03 -1.135e-02 -7.586e-03 -3.843e-03
nitrogen_0.5cm              -1.840e-01 -2.902e-01 -1.831e-01 -8.758e-02
sand_0.5cm                   1.264e-02  6.292e-03  1.266e-02  1.882e-02
soc_5.15cm                   1.622e-02  7.205e-03  1.618e-02  2.581e-02
                            parameters
(Intercept)                      2.425
I.q5505_weedSeverity_num        -0.077
I.q5509_diseaseSeverity_num     -0.017
I.q5506_insectSeverity_num       0.013
Nperha                           0.002
P2O5perha                        0.002
variety_type_NMWV                0.398
G.q5305_irrigTimes               0.364
A.q111_fGenderdum               -0.068
Weedmanaged                      0.296
temp                            -0.045
precip                           0.000
wc2.1_30s_elev                  -0.001
M.q708_marketDistance           -0.008
nitrogen_0.5cm                  -0.187
sand_0.5cm                       0.013
soc_5.15cm                       0.017
-
Acceptance probability:
      Mean 2.5% 50% 97.5%
alpha    1    1   1     1
-
Smooth terms:
                                                                                        Mean
s(feb_mean_temp).tau21                                                             2.809e-01
s(feb_mean_temp).alpha                                                             1.000e+00
s(feb_mean_temp).edf                                                               3.573e+00
s(march_mean_temp).tau21                                                           2.065e-01
s(march_mean_temp).alpha                                                           1.000e+00
s(march_mean_temp).edf                                                             3.626e+00
s(april_mean_temp).tau21                                                           1.141e-01
s(april_mean_temp).alpha                                                           1.000e+00
s(april_mean_temp).edf                                                             2.923e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).tau21 1.994e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).alpha 1.000e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).edf   2.429e+01
                                                                                        2.5%
s(feb_mean_temp).tau21                                                             1.209e-04
s(feb_mean_temp).alpha                                                             1.000e+00
s(feb_mean_temp).edf                                                               1.037e+00
s(march_mean_temp).tau21                                                           1.956e-04
s(march_mean_temp).alpha                                                           1.000e+00
s(march_mean_temp).edf                                                             1.047e+00
s(april_mean_temp).tau21                                                           3.300e-04
s(april_mean_temp).alpha                                                           1.000e+00
s(april_mean_temp).edf                                                             1.126e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).tau21 8.507e-01
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).alpha 1.000e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).edf   2.134e+01
                                                                                         50%
s(feb_mean_temp).tau21                                                             5.296e-02
s(feb_mean_temp).alpha                                                             1.000e+00
s(feb_mean_temp).edf                                                               3.464e+00
s(march_mean_temp).tau21                                                           9.581e-02
s(march_mean_temp).alpha                                                           1.000e+00
s(march_mean_temp).edf                                                             3.831e+00
s(april_mean_temp).tau21                                                           1.548e-02
s(april_mean_temp).alpha                                                           1.000e+00
s(april_mean_temp).edf                                                             2.607e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).tau21 1.817e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).alpha 1.000e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).edf   2.438e+01
                                                                                       97.5%
s(feb_mean_temp).tau21                                                             1.956e+00
s(feb_mean_temp).alpha                                                             1.000e+00
s(feb_mean_temp).edf                                                               7.119e+00
s(march_mean_temp).tau21                                                           1.064e+00
s(march_mean_temp).alpha                                                           1.000e+00
s(march_mean_temp).edf                                                             6.369e+00
s(april_mean_temp).tau21                                                           1.149e+00
s(april_mean_temp).alpha                                                           1.000e+00
s(april_mean_temp).edf                                                             6.776e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).tau21 3.916e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).alpha 1.000e+00
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).edf   2.674e+01
                                                                                   parameters
s(feb_mean_temp).tau21                                                                  1.829
s(feb_mean_temp).alpha                                                                     NA
s(feb_mean_temp).edf                                                                    7.075
s(march_mean_temp).tau21                                                                0.000
s(march_mean_temp).alpha                                                                   NA
s(march_mean_temp).edf                                                                  0.953
s(april_mean_temp).tau21                                                                0.000
s(april_mean_temp).alpha                                                                   NA
s(april_mean_temp).edf                                                                  1.109
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).tau21      6.329
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).alpha         NA
s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude,by=Sowing_Date_Early).edf       27.788
---
Formula sigma:
---
sigma ~ 1
-
Parametric coefficients:
               Mean    2.5%     50%   97.5% parameters
(Intercept) -0.4064 -0.4222 -0.4061 -0.3919     -0.411
-
Acceptance probability:
        Mean   2.5%    50% 97.5%
alpha 0.9960 0.9693 1.0000     1
---
Sampler summary:
-
DIC = 15367.27 logLik = -7656.417 pd = 54.4407
runtime = 62.99
---
Optimizer summary:
-
AICc = 15361.61 edf = 54.9242 logLik = -7625.472
logPost = -16222.48 nobs = 7562 runtime = 5.73
## Plot estimated effects.
plot(b)

# ## Predict for each latitude and longitude
pred <- expand.grid(O.largestPlotGPS.Longitude = seq(82, 89, length = 100),O.largestPlotGPS.Latitude = seq(24,28, length = 100))

pred$Sowing_Date_Early=1

pred$I.q5505_weedSeverity_num=mean(LDSestim_sow$I.q5505_weedSeverity_num)


pred$I.q5509_diseaseSeverity_num=mean(LDSestim_sow$I.q5509_diseaseSeverity_num)


pred$I.q5506_insectSeverity_num=mean(LDSestim_sow$I.q5506_insectSeverity_num)  


pred$Nperha=mean(LDSestim_sow$Nperha)

pred$P2O5perha=mean(LDSestim_sow$P2O5perha)

pred$variety_type_NMWV=mean(LDSestim_sow$variety_type_NMWV)

pred$G.q5305_irrigTimes=mean(LDSestim_sow$G.q5305_irrigTimes)

pred$A.q111_fGenderdum=mean(LDSestim_sow$A.q111_fGenderdum)

pred$Weedmanaged=mean(LDSestim_sow$Weedmanaged)

pred$temp=mean(LDSestim_sow$temp)

pred$precip=mean(LDSestim_sow$precip)

pred$wc2.1_30s_elev=mean(LDSestim_sow$wc2.1_30s_elev)  

pred$M.q708_marketDistance=mean(LDSestim_sow$M.q708_marketDistance)

pred$nitrogen_0.5cm=mean(LDSestim_sow$nitrogen_0.5cm)
pred$sand_0.5cm=mean(LDSestim_sow$sand_0.5cm)
pred$soc_5.15cm=mean(LDSestim_sow$soc_5.15cm)
pred$feb_mean_temp=mean(LDSestim_sow$march_mean_temp)
pred$march_mean_temp=mean(LDSestim_sow$march_mean_temp)
pred$april_mean_temp=mean(LDSestim_sow$april_mean_temp)

pred2 <- expand.grid(O.largestPlotGPS.Longitude = seq(82, 89, length = 100),O.largestPlotGPS.Latitude = seq(24,28, length = 100))

pred2$Sowing_Date_Early=0

pred2$I.q5505_weedSeverity_num=mean(LDSestim_sow$I.q5505_weedSeverity_num)

pred2$I.q5509_diseaseSeverity_num=mean(LDSestim_sow$I.q5509_diseaseSeverity_num)

pred2$I.q5506_insectSeverity_num=mean(LDSestim_sow$I.q5506_insectSeverity_num)

pred2$Nperha=mean(LDSestim_sow$Nperha)
pred2$P2O5perha=mean(LDSestim_sow$P2O5perha)
pred2$variety_type_NMWV=mean(LDSestim_sow$variety_type_NMWV)
pred2$G.q5305_irrigTimes=mean(LDSestim_sow$G.q5305_irrigTimes)
pred2$A.q111_fGenderdum=mean(LDSestim_sow$A.q111_fGenderdum)
pred2$Weedmanaged=mean(LDSestim_sow$Weedmanaged)
pred2$temp=mean(LDSestim_sow$temp)
pred2$precip=mean(LDSestim_sow$precip)
pred2$wc2.1_30s_elev=mean(LDSestim_sow$wc2.1_30s_elev)  

pred2$M.q708_marketDistance=mean(LDSestim_sow$M.q708_marketDistance)

pred2$nitrogen_0.5cm=mean(LDSestim_sow$nitrogen_0.5cm)
pred2$sand_0.5cm=mean(LDSestim_sow$sand_0.5cm)
pred2$soc_5.15cm=mean(LDSestim_sow$soc_5.15cm)
pred2$feb_mean_temp=mean(LDSestim_sow$march_mean_temp)
pred2$march_mean_temp=mean(LDSestim_sow$march_mean_temp)
pred2$april_mean_temp=mean(LDSestim_sow$april_mean_temp)

tau_hat <- predict(b,newdata=pred)
tau_hat2 <- predict(b,newdata=pred2)


tau_hat=as.data.frame(tau_hat)
tau_hat2=as.data.frame(tau_hat2)

names(tau_hat)[1:2]=c("mu_1","sigma_1")
names(tau_hat2)[1:2]=c("mu_2","sigma_2")

pred_tau_hat=cbind(pred,tau_hat,tau_hat2)

pred_tau_hat=as.data.frame(pred_tau_hat)

pred_tau_hat$sowing_yield_gain=pred_tau_hat$mu_1-pred_tau_hat$mu_2


# library(terra)
# 
pred_tau_hat$Sowing_Date_Early=NULL
pred_tau_hat$feb_mean_temp=NULL
pred_tau_hat$march_mean_temp=NULL
pred_tau_hat$april_mean_temp=NULL



pred_tau_hat$I.q5505_weedSeverity_num=NULL

pred_tau_hat$I.q5509_diseaseSeverity_num=NULL

pred_tau_hat$I.q5506_insectSeverity_num=NULL

pred_tau_hat$Nperha=NULL
pred_tau_hat$P2O5perha=NULL
pred_tau_hat$variety_type_NMWV=NULL
pred_tau_hat$G.q5305_irrigTimes=NULL
pred_tau_hat$A.q111_fGenderdum=NULL
pred_tau_hat$Weedmanaged=NULL
pred_tau_hat$temp=NULL
pred_tau_hat$precip=NULL
pred_tau_hat$wc2.1_30s_elev=NULL

pred_tau_hat$M.q708_marketDistance=NULL

pred_tau_hat$nitrogen_0.5cm=NULL
pred_tau_hat$sand_0.5cm=NULL
pred_tau_hat$soc_5.15cm=NULL
pred_tau_hat$feb_mean_temp=NULL
pred_tau_hat$march_mean_temp=NULL
pred_tau_hat$april_mean_temp=NULL

pred_tau_hat$mu_1=NULL
pred_tau_hat$sigma_1=NULL
pred_tau_hat$mu_2=NULL
pred_tau_hat$sigma_2=NULL


 myras <- rast(pred_tau_hat, type="xyz")
plot(myras)

 library(raster)
 myras2=raster(myras)
 
library(sf)
India_aoi_sf_dis_sp=as_Spatial(India_aoi_sf_dis)
myras2=mask(myras2,India_aoi_sf_dis_sp)
plot(myras2)

#library(mapview)
#mapview(myras2,layer.name="Yield gain (t/ha) to early sowing")

4 Interpolation for point-geocoded data

4.1 Gridded data input variables

The first strategy is to translate all variables to the grid. This involves interpolation across space and using new variable names. In this case, instead of gender being a dummy, you use a proportion of female or male after interpolation. You can read more details here: https://www.paulamoraga.com/book-spatial/spatial-interpolation-methods.html.

4.1.1 Proximity polygons

library(rio)
LDS=import("LDS_wheat_public_cleaned.csv")

library(sp)
LDS_sp=SpatialPointsDataFrame(cbind(LDS$O.largestPlotGPS.Longitude,LDS$O.largestPlotGPS.Latitude),data=LDS,proj4string=CRS("+proj=longlat +datum=WGS84"))

library(terra)
LDS_v=vect(LDS_sp)
if (!require("rspat")) remotes::install_github('rspatial/rspat')

library(rspat)
v <- voronoi(LDS_v)
plot(v)
points(LDS_v)

v_india_aoi <- crop(v,India_aoi)
plot(v_india_aoi, "yield_kgperha")

e <- extent(c(min(LDS$O.largestPlotGPS.Longitude)-2,max(LDS$O.largestPlotGPS.Longitude)+2,min(LDS$O.largestPlotGPS.Latitude)-2,max(LDS$O.largestPlotGPS.Latitude)+2))

library(raster)
aoi_terra <- rast(ext=e, res=1/6)

vr <- terra::rasterize(v_india_aoi, aoi_terra, "yield_kgperha")
plot(vr)

# Compare with straight rasterizing
straight_rasterize <- terra::rasterize(vect(LDS_sp), aoi_terra, "yield_kgperha")
plot(straight_rasterize)

4.1.2 Nearest-neigbor

library(gstat)
library(raster)

e <- extent(c(min(LDS$O.largestPlotGPS.Longitude)-2,max(LDS$O.largestPlotGPS.Longitude)+2,min(LDS$O.largestPlotGPS.Latitude)-2,max(LDS$O.largestPlotGPS.Latitude)+2))

aoi <- raster(ext=e, res=1/6)

gs <- gstat(formula=yield_kgperha~1, data=LDS_sp, nmax=5, set=list(idp = 0))

nn <- interpolate(aoi, gs, debug.level=0)
nnmsk <- mask(nn, India_aoi_sf)
plot(nnmsk)

library(mapview)
mapview(nnmsk, layer.name="Nearest neigbor interpolated yield")

4.1.3 Inverse distance weighted

For inverse distance, we remove specifications on neigbors and inverse distance being zero in the nearest neighbor code [nmax=5, set=list(idp = 0)].

library(gstat)
gs <- gstat(formula=yield_kgperha~1, data=LDS_sp)
idw <- interpolate(aoi, gs, debug.level=0)
idwr <- mask(idw, India_aoi_sf)
plot(idwr)

library(mapview)
mapview(idwr, layer.name="IDW yield")

4.2 Model-based kriging

This is the well known frequentist geostatistical approach using variogram. Note that this is computationally heavy! I usually use the Bayesian geostatistical models.

For either approach, we assume that covariance between random variables in two locations depends on interlocation distance. This can be modeled using different functions. That is, one may assume that the covariance is an exponential function ^\((C(d_{ii})=\sigma^2 e^{-\phi d_{ii}})\) of interlocation distance with some variogram parameters. Few terms are important to understand. “Sill” is the total variance consisting of the nugget effect \(\tau^2\) and partial sill \(\sigma^2\). \(\phi\) is the decay parameter with \(1/\phi\) being the range, i.e., the distance before reach 95% of the sill.

library(gstat)
v <- variogram(L.tonPerHectare ~ 1, data = LDS_sp)
plot(v)

# Guess parameters of nugget (x=0, y=?), partial sill (x100-x95), range(dist at partial sill)

# vinitial <- vgm(psill = 0.2, model = "Exp",
#                 range = 100, nugget = 0.8)
# 
# plot(v, vinitial, cutoff = 1000, cex = 1.5)
# 
# fv <- fit.variogram(object = v,
#                     model = vgm(psill = 0.5, model = "Sph",
#                                 range = 100, nugget = 0.4))
# fv
# plot(v, fv, cex = 1.5)

# library(ggplot2)
# library(viridis)
# 
# k <- gstat(formula = L.tonPerHectare ~ 1, data = LDS_sp, model = fv)
# 
# kpred <- predict(k, aoi )
# 
# ggplot() + geom_sf(data = kpred, aes(color = var1.pred)) 
# 
# ggplot() + geom_sf(data = kpred, aes(color = var1.var))

4.3 Model-based predictions

4.3.1 Random forest and raster prediction

This approach follows notes from reago website by Robert Hjimans (https://reagro.org/cases/croptrials.html).

4.3.1.1 Interpolate RF model

library(rio)
LDS=import("LDS_wheat_public_cleaned.csv")
table(LDS$A.q103_district,LDS$A.q102_state)
                
                 Bihar UttarPradesh
  Arah             208            0
  Araria           117            0
  Arwal            181            0
  Aurangabad       194            0
  Balia              0          216
  Banka            176            0
  Begusarai        213            0
  Bhagalpur        207            0
  Buxar            207            0
  Chandauli          0          208
  Deoria             0          209
  EastChamparan    205            0
  Gaya             193            0
  Gazipur            0          210
  Gopalganj        210            0
  Gorakhpur          0          210
  Jehanabad        189            0
  Kaimur           204            0
  Katihar          115            0
  Khagaria         205            0
  Kushinagar         0          195
  Lakhisarai       195            0
  Madhepura        169            0
  Maharajganj        0          210
  Mau                0          204
  Munger           210            0
  Muzaffarpur      204            0
  Nalanda          196            0
  Patna            203            0
  Purnia             8            0
  Rohtas           196            0
  Saharsa          163            0
  Samastipur       202            0
  Saran            209            0
  Sheohar          209            0
  Siddharthnagar     0          193
  Siwan            198            0
  Supaul           206            0
  Vaishali         196            0
  WestChamparan    205            0
plot(LDS$O.largestPlotGPS.Longitude, LDS$O.largestPlotGPS.Latitude, col="red", pch=20)

# Random Forest Estimation 
library(randomForest)

RF_model <- randomForest(yield_kgperha ~ O.largestPlotGPS.Longitude + O.largestPlotGPS.Latitude, data=LDS)

varImpPlot(RF_model)

RF_model_pred = predict(RF_model)
plot(LDS$yield_kgperha, RF_model_pred)
abline(0,1)

# Raster prediction

## Create grid with extent
library(raster)

e <- extent(c(min(LDS$O.largestPlotGPS.Longitude)-2,max(LDS$O.largestPlotGPS.Longitude)+2,min(LDS$O.largestPlotGPS.Latitude)-2,max(LDS$O.largestPlotGPS.Latitude)+2))

aoi <- raster(ext=e, res=1/6)

# Interpolate
pp <- interpolate(aoi, RF_model, xyNames=c('O.largestPlotGPS.Longitude', 'O.largestPlotGPS.Latitude'))
pp <- mask(pp, India_aoi_sf)
pp <- crop(pp, India_aoi_sf)
plot(pp)

#points(LDS$O.largestPlotGPS.Longitude, LDS$O.largestPlotGPS.Latitude, col="blue")

library(mapview)
mapview(pp, layer.name="Random forest yield prediction")

4.3.1.2 Raster predict from RF model: Kriging Wheat Prices

The spatial prediction -the variables are spatial- function takes two arguments: the prediction variables and the price prediction model. Both the “stats” package(loaded as a randomForest dependency) and the “raster” package have a function called “predict” that can make predictions. Since we are dealing with spatial data, we add a prefix to the function name to ensure the “predict” function in the raster “package” is used.

# library(ncdf4)
# library(raster)
# library(sf)
# library(data.table)
# library(exactextractr)
# library(terra)
# library(rgdal)
# library(geodata)
# 
# # Step 1
# #dir.create("rasters") # Create a directory to put the downloaded raster files
# #population=population(2015,0.5,path="rasters")
# #elevationglobal_geodata=elevation_global(0.5,path="rasters")
# 
# rasterstack <- stack() 
# raster::crs(rasterstack)="+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# rasterlist <- list.files(path = "rasters",recursive=TRUE, 
#                          pattern = "*.tif$", 
#                          full.names = TRUE) # Character vector of relative filepaths
# 
# 
# for (rasterpath in rasterlist) {
#       rasterfile <- raster(rasterpath)
#       rasterstack <- addLayer(rasterstack, rasterfile)
#     }
# 
# refrenceraster <- rasterstack[[1]]
# 
# LDS_sp=SpatialPointsDataFrame(cbind(LDS$O.largestPlotGPS.Longitude,LDS$O.largestPlotGPS.Latitude),data=LDS,proj4string=CRS("+proj=longlat +datum=WGS84"))
# 
# 
# India_aoi_sp=as_Spatial(India_aoi_sf)
# 
# # # Step 2
# India_aoi_sp.raster <- rasterize(India_aoi_sp, refrenceraster)
# 
# # Step 3
# latitudes <- xFromCell(India_aoi_sp.raster, 1:length(India_aoi_sp.raster))  
# longitudes <- yFromCell(India_aoi_sp.raster, 1:length(India_aoi_sp.raster))
# 
# # Step 4
# India_aoi_sp.raster.lati <- India_aoi_sp.raster.long <- India_aoi_sp.raster
# values(India_aoi_sp.raster.lati) <- latitudes
# values(India_aoi_sp.raster.long) <- longitudes
# 
# # Step 5
# names(India_aoi_sp.raster.long) <- "Longitude"
# names(India_aoi_sp.raster.lati) <- "Latitude"
# 
# 
# rasterstack <- stack(rasterstack, India_aoi_sp.raster.long, India_aoi_sp.raster.lati)
# 
# # step 6
# library(terra)
# predict.vrbs.r = terra::extract(rasterstack,
#                   LDS_sp,
#                   #buffer=1, # Meters
#                  # buffer=5000, # Meters
#                   #small=TRUE,
#                   fun = mean)
# 
# predict.vrbs.r <- predict.vrbs.r[complete.cases(predict.vrbs.r),]
# 
# RF_model_rast <- randomForest(x=predict.vrbs.r,
#                          y=LDS$L.q607_farmGatePricePerKg)
# 
# raster.prediction <- raster::predict(rasterstack, # Prediction variable rasters
#                                      RF_model_rast # Prediction  model
#                                     )   
# 
# 
# 
# raster.prediction.c=crop(raster.prediction,India_aoi_sp)
# raster.prediction.m=mask(raster.prediction.c,India_aoi_sp)
# 
# # Plot the raster
# library(rasterVis)
# 
# raster.prediction.m_plot=levelplot(raster.prediction.m,par.settings=RdBuTheme())
# raster.prediction.m_plot

4.3.2 Spatial Bayesian Geostatistical Gaussian Process Model [Aka Bayesian Kriging]

If one is interested in calculating other measures other than the predicted value (for example, the probability of exceeding some amount), then a Bayesian gaussian process model is the best alternative in that using Markov Chain Monte Carlo simulations one can use a probabilistic assessment.

### Bayesian kriging 

LDS_small <- LDS[sample(1:nrow(LDS),800),] 


coords=dplyr::select(LDS_small,O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude)
coords=as.matrix(coords)

# The public version of the data has duplicated coordinates
# We need to jitter these because spatial Bayesian kriging requires unique coordinates.
library(geoR)
coords=jitterDupCoords(coords,min=2,max=10)
coords=as.matrix(coords)

# Bayesian models take much time to render. We sample 1000 observations to showcase the approach
library(spBayes)
n.samples=1000

t1 <- Sys.time()

r <-1
n.ltr <- r*(r+1)/2

priors <- list("phi.Unif"=list(rep(1,r), rep(10,r)), "K.IW"=list(r, diag(rep(1,r))), "tau.sq.IG"=c(2, 1))

starting <- list("phi"=rep(3/0.5,r), "A"=rep(1,n.ltr), "tau.sq"=1)

tuning <- list("phi"=rep(0.1,r), "A"=rep(0.01, n.ltr), "tau.sq"=0.01)

cf.sowing.sp <- spBayes::spSVC(yield_kgperha~1, data=LDS_small,coords=coords,
                                  starting= starting,
                                  tuning=tuning,
                                  priors=priors,
                                  cov.model="exponential",n.samples=n.samples,
                                  n.omp.threads=15,svc.cols=c("(Intercept)"))
----------------------------------------
    General model description
----------------------------------------
Model fit with 800 observations.

Number of covariates 1.

Number of space varying covariates 1.

Using the exponential spatial correlation model.

Number of MCMC samples 1000.

Priors and hyperpriors:
    beta flat.
    K IW hyperpriors:
    df: 1.00000
    S:
    1.000   

    phi Unif lower bound hyperpriors:   1.000   
    phi Unif upper bound hyperpriors:   10.000  

    tau.sq IG hyperpriors shape=2.00000 and scale=1.00000

Source compiled with OpenMP, posterior sampling is using 15 thread(s).
-------------------------------------------------
        Sampling
-------------------------------------------------
Sampled: 100 of 1000, 10.00%
Report interval Metrop. Acceptance rate: 47.00%
Overall Metrop. Acceptance rate: 47.00%
-------------------------------------------------
Sampled: 200 of 1000, 20.00%
Report interval Metrop. Acceptance rate: 49.00%
Overall Metrop. Acceptance rate: 48.00%
-------------------------------------------------
Sampled: 300 of 1000, 30.00%
Report interval Metrop. Acceptance rate: 25.00%
Overall Metrop. Acceptance rate: 40.33%
-------------------------------------------------
Sampled: 400 of 1000, 40.00%
Report interval Metrop. Acceptance rate: 26.00%
Overall Metrop. Acceptance rate: 36.75%
-------------------------------------------------
Sampled: 500 of 1000, 50.00%
Report interval Metrop. Acceptance rate: 28.00%
Overall Metrop. Acceptance rate: 35.00%
-------------------------------------------------
Sampled: 600 of 1000, 60.00%
Report interval Metrop. Acceptance rate: 27.00%
Overall Metrop. Acceptance rate: 33.67%
-------------------------------------------------
Sampled: 700 of 1000, 70.00%
Report interval Metrop. Acceptance rate: 27.00%
Overall Metrop. Acceptance rate: 32.71%
-------------------------------------------------
Sampled: 800 of 1000, 80.00%
Report interval Metrop. Acceptance rate: 32.00%
Overall Metrop. Acceptance rate: 32.62%
-------------------------------------------------
Sampled: 900 of 1000, 90.00%
Report interval Metrop. Acceptance rate: 24.00%
Overall Metrop. Acceptance rate: 31.67%
-------------------------------------------------
Sampled: 1000 of 1000, 100.00%
Report interval Metrop. Acceptance rate: 26.00%
Overall Metrop. Acceptance rate: 31.10%
-------------------------------------------------
t2 <- Sys.time()
t2 - t1
Time difference of 1.324326 mins
burn.in <- floor(0.75*n.samples)

cf.sowing.sp.r <- spRecover(cf.sowing.sp, start=burn.in)
Source compiled with OpenMP, posterior sampling is using 1 thread(s).
-------------------------------------------------
    Recovering beta and w
-------------------------------------------------
Sampled: 99 of 251, 39.44%
Sampled: 199 of 251, 79.28%
# Kriging
library(terra)
library(stars)
library(raster)
library(gstat) # Use gstat's idw routine
library(sp)    # Used for the spsample function
library(tmap)
library(geodata)

# India=gadm(country="IND", level=1, path=tempdir())
# plot(India)
# India_State_Boundary=subset(India,India$NAME_1=="Bihar")
# plot(India_State_Boundary)
# library(sf)

#India_State_Boundary=st_as_sf(India_State_Boundary)

#India_State_Boundary_Bihar_sp=as_Spatial(India_State_Boundary_Bihar)

library(spBayes)
India_aoi_sf_dis_sp=as_Spatial(India_aoi_sf_dis)

LDS_small_sp=SpatialPointsDataFrame(cbind(LDS_small$O.largestPlotGPS.Longitude,LDS_small$O.largestPlotGPS.Latitude),data=LDS_small,proj4string=CRS("+proj=longlat +datum=WGS84"))


LDS_small_sp@bbox <- India_aoi_sf_dis_sp@bbox

grd <- as.data.frame(spsample(LDS_small_sp, "regular", n=10000))

names(grd)       <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd)     <- TRUE  # Create SpatialPixel object
fullgrid(grd)    <- TRUE  # Create SpatialGrid object
plot(grd)

India_aoi_sf_dis_sp_poly <- India_aoi_sf_dis_sp@polygons[[1]]@Polygons[[1]]@coords
India_aoi_sf_dis_sp_poly <- as.matrix(India_aoi_sf_dis_sp_poly)

pred.coords <- SpatialPoints(grd)@coords
pred.coords =as.matrix(pred.coords)

pred.covars <- as.matrix(rep(1, nrow(pred.coords)))

cf.sowing.sp.pred <- spPredict(cf.sowing.sp.r,pred.coords=pred.coords,
                                    pred.covars=pred.covars, n.omp.threads=15)
Source compiled with OpenMP, posterior sampling is using 15 thread(s).
-------------------------------------------------
    Point-wise sampling of predicted w
-------------------------------------------------
Sampled: 99 of 251, 39.44%
Sampled: 199 of 251, 79.28%
cf.sowing.sp.pred.pred.mu = apply(cf.sowing.sp.pred$p.y.predictive.samples,1,mean)
cf.sowing.sp.pred.sd = apply(cf.sowing.sp.pred$p.y.predictive.samples,1,sd)

library(MBA)
library(fields)

pred.grid <- as.data.frame(list(pred.coords,pred.mu=cf.sowing.sp.pred.pred.mu,pred.sd=cf.sowing.sp.pred.sd))

coordinates(pred.grid) = c("X", "Y")
gridded(pred.grid) <- TRUE
pred.mu.image <- as.image.SpatialGridDataFrame(pred.grid["pred.mu"])
pred.sd.image <- as.image.SpatialGridDataFrame(pred.grid["pred.sd"])


# predict and probability ------------------------------------------------
cf.sowing.sp.pred.pred.prob_3tons=rowSums(cf.sowing.sp.pred$p.y.predictive.samples>3000)/251
cf.sowing.sp.pred.pred.prob_5tons=rowSums(cf.sowing.sp.pred$p.y.predictive.samples>5000)/251


pred.grid <- as.data.frame(list(pred.coords,pred.mu=cf.sowing.sp.pred.pred.mu,pred.sd=cf.sowing.sp.pred.sd,
                                pred.prob_3tons=cf.sowing.sp.pred.pred.prob_3tons,
                                pred.prob_5tons=cf.sowing.sp.pred.pred.prob_5tons))

coordinates(pred.grid) = c("X", "Y")
gridded(pred.grid) <- TRUE

pred.mu.image <- as.image.SpatialGridDataFrame(pred.grid["pred.mu"])
pred.sd.image <- as.image.SpatialGridDataFrame(pred.grid["pred.sd"])
pred.prob.image_3tons <- as.image.SpatialGridDataFrame(pred.grid["pred.prob_3tons"])
pred.prob.image_5tons<- as.image.SpatialGridDataFrame(pred.grid["pred.prob_5tons"])

# Rastervis
library(rasterVis)
pred.mu=pred.grid["pred.mu"]
pred.mu=raster(pred.mu)
pred.mu=mask(pred.mu,India_aoi_sf_dis_sp)
pred.mu_plot=levelplot(pred.mu,par.settings=RdBuTheme(),contour=TRUE)
pred.mu_plot

#mapview(pred.mu_plot,layer.name="Mean posterior yield")

# Standard deviation
library(rasterVis)
pred.sd=pred.grid["pred.sd"]
pred.sd=raster(pred.sd)
pred.sd=mask(pred.sd,India_aoi_sf_dis_sp)
pred.sd_plot=levelplot(pred.sd,par.settings=RdBuTheme(),contour=TRUE)
pred.sd_plot

#mapview(pred.sd_plot, layer.name="posterior SD")

# Probability of 3 tons
pred.prob_3tons=pred.grid["pred.prob_3tons"]
pred.prob_3tons=raster(pred.prob_3tons)
pred.prob_3tons=mask(pred.prob_3tons,India_aoi_sf_dis_sp)
pred.prob_3tons_plot=levelplot(pred.prob_3tons,par.settings=RdBuTheme(),contour=TRUE)
pred.prob_3tons_plot

library(mapview)
#mapview(pred.prob_3tons_plot, layer.name="Probability of >3 tons")

# Probability of 5 tons
pred.prob_5tons=pred.grid["pred.prob_5tons"]
pred.prob_5tons=raster(pred.prob_5tons)
pred.prob_5tons=mask(pred.prob_5tons,India_aoi_sf_dis_sp)
pred.prob_5tons_plot=levelplot(pred.prob_5tons,par.settings=RdBuTheme(),contour=TRUE)
pred.prob_5tons_plot

library(mapview)
#mapview(pred.prob_5tons_plot,layer.name="Probability of >5 tons")

4.3.3 Spatial Bayesian Geoadditive Model

library(bamlss)
set.seed(111)
f <- yield_kgperha ~  s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude)

## estimate model.
b <- bamlss(f, data = LDS)
AICc 124684.3 logPost -62379.2 logLik -62338.1 edf 4.0017 eps 0.0571 iteration   1
AICc 124680.3 logPost -62412.1 logLik -62336.1 edf 4.0285 eps 0.0001 iteration   2
AICc 124642.1 logPost -62435.7 logLik -62316.7 edf 4.2724 eps 0.0012 iteration   3
AICc 124399.2 logPost -62396.0 logLik -62193.6 edf 5.9280 eps 0.0079 iteration   4
AICc 123883.3 logPost -62208.8 logLik -61930.0 edf 11.637 eps 0.0198 iteration   5
AICc 123525.4 logPost -62020.8 logLik -61740.3 edf 22.327 eps 0.0173 iteration   6
AICc 123437.4 logPost -61950.1 logLik -61689.0 edf 29.494 eps 0.0105 iteration   7
AICc 123436.0 logPost -61951.6 logLik -61687.5 edf 30.404 eps 0.0014 iteration   8
AICc 123436.0 logPost -61951.6 logLik -61687.5 edf 30.404 eps 0.0000 iteration   9
AICc 123436.0 logPost -61951.6 logLik -61687.5 edf 30.404 eps 0.0000 iteration   9
elapsed time:  1.06sec
Starting the sampler...

|                    |   0% 26.18sec
|*                   |   5% 25.84sec  1.36sec
|**                  |  10% 22.86sec  2.54sec
|***                 |  15% 22.10sec  3.90sec
|****                |  20% 21.16sec  5.29sec
|*****               |  25% 20.28sec  6.76sec
|******              |  30% 19.20sec  8.23sec
|*******             |  35% 18.01sec  9.70sec
|********            |  40% 16.63sec 11.09sec
|*********           |  45% 15.38sec 12.58sec
|**********          |  50% 14.12sec 14.12sec
|***********         |  55% 12.85sec 15.70sec
|************        |  60% 11.49sec 17.23sec
|*************       |  65% 10.08sec 18.72sec
|**************      |  70%  8.63sec 20.14sec
|***************     |  75%  7.22sec 21.65sec
|****************    |  80%  5.76sec 23.03sec
|*****************   |  85%  4.32sec 24.48sec
|******************  |  90%  2.87sec 25.87sec
|******************* |  95%  1.44sec 27.33sec
|********************| 100%  0.00sec 28.84sec
## Plot estimated effects.
plot(b)

## Predict for each latitude and longitude
pred <- expand.grid(O.largestPlotGPS.Longitude = seq(82, 89, length = 100),O.largestPlotGPS.Latitude = seq(24,28, length = 100))
                    
yield_hat <- predict(b,newdata=pred)

yield_hat=as.data.frame(yield_hat)

pred_yield_hat=cbind(pred,yield_hat)

#pred_yield_hat$sigma=NULL
library(terra)

myras <- rast(pred_yield_hat, type="xyz")
plot(myras)

library(raster)
myras2=raster(myras)

library(sf)
India_aoi_sf_dis_sp=as_Spatial(India_aoi_sf_dis)
myras2=mask(myras2,India_aoi_sf_dis_sp)
plot(myras2)

library(mapview)
mapview(myras2, layer.name="Geoadditive Yield" )

4.3.4 BAMLSS gridded treatment effects: post-processing

library(bamlss)
set.seed(111)
f <- predictions ~  s(O.largestPlotGPS.Longitude,O.largestPlotGPS.Latitude)

## estimate model.
b <- bamlss(f, data = X_cf_sowingtau)
AICc -24865.5 logPost 12472.15 logLik 12463.39 edf 30.476 eps 0.1084 iteration   1
AICc -25039.2 logPost 12558.82 logLik 12550.43 edf 30.694 eps 0.0167 iteration   2
AICc -25041.2 logPost 12559.79 logLik 12551.51 edf 30.750 eps 0.0019 iteration   3
AICc -25041.2 logPost 12559.79 logLik 12551.51 edf 30.755 eps 0.0000 iteration   4
AICc -25041.2 logPost 12559.79 logLik 12551.51 edf 30.755 eps 0.0000 iteration   4
elapsed time:  0.39sec
Starting the sampler...

|                    |   0% 24.99sec
|*                   |   5% 25.27sec  1.33sec
|**                  |  10% 23.49sec  2.61sec
|***                 |  15% 22.27sec  3.93sec
|****                |  20% 21.44sec  5.36sec
|*****               |  25% 20.40sec  6.80sec
|******              |  30% 19.34sec  8.29sec
|*******             |  35% 18.05sec  9.72sec
|********            |  40% 16.71sec 11.14sec
|*********           |  45% 15.50sec 12.68sec
|**********          |  50% 14.14sec 14.14sec
|***********         |  55% 12.77sec 15.61sec
|************        |  60% 11.42sec 17.13sec
|*************       |  65% 10.02sec 18.60sec
|**************      |  70%  8.62sec 20.11sec
|***************     |  75%  7.20sec 21.61sec
|****************    |  80%  5.76sec 23.05sec
|*****************   |  85%  4.32sec 24.49sec
|******************  |  90%  2.87sec 25.80sec
|******************* |  95%  1.42sec 27.04sec
|********************| 100%  0.00sec 28.55sec
## Plot estimated effects.
plot(b)

## Predict for each latitude and longitude
pred <- expand.grid(O.largestPlotGPS.Longitude = seq(82, 89, length = 100),O.largestPlotGPS.Latitude = seq(24,28, length = 100))
                    
yield_hat <- predict(b,newdata=pred)

yield_hat=as.data.frame(yield_hat)

pred_yield_hat=cbind(pred,yield_hat)

#pred_yield_hat$sigma=NULL
library(terra)

myras <- rast(pred_yield_hat, type="xyz")
plot(myras)

library(raster)
myras2=raster(myras)

library(sf)
India_aoi_sf_dis_sp=as_Spatial(India_aoi_sf_dis)
myras2=mask(myras2,India_aoi_sf_dis_sp)
plot(myras2)

5 Areal data

There two important cases in which one may have areal data to use in climate adaptation prioritization. The first case is in which there is plot or farm level plot but that the statistical agency did not provide individual level coordinates. In these cases, the only of relating the individual observations to spatial data is through a higher level spatial level. Secondly, there are cases especially with administrative data in which the data are available only at the aggregated spatial level.

5.1 Using centroid

One can get the centroid of each polygon, e.g., then fit a geostatistical model as in the spatial Bayesian Geostatistical Gaussian Process Model or geoadditive model as above.

5.1.1 Markov Random Field (MRF) Geoadditive Structured and Unstructured Spatial Model

In the case where there are data at individual farm level albeit not have geocoordinates,one can use structured geoadditive model to ascertain the patterns of the explained and unexplained spatial effect.

library(BayesX)
library("R2BayesX")


India_aoi_sp=as_Spatial(India_aoi_sf)

library(rgdal)
#writeOGR(India_aoi_sp,dsn="shp",layer="India_aoi_sp",driver = "ESRI Shapefile")

shpname <- file.path(getwd(), "shp" , "India_aoi_sp")

India_aoi_sp_bnd <- BayesX::shp2bnd(shpname=shpname, regionnames = "NAME_2", check.is.in = F)
Reading map ... finished
Note: map consists originally of 50 polygons
Note: map consists of 47 regions
#write.bnd(India_aoi_sp_bnd, "shp/India_aoi_sp_bnd.bnd", replace = FALSE)

# za <- bayesx(dp2011 ~ code +   sx(ID_2, bs = "mrf", map = India_aoi_sp_bnd) +
#                sx(ID_2, bs = "re"), iter = 1200,  step = 10, data = India_aoi_sp_bnd)

5.2 Other Areal-Point Downscaling Methods

6 Conclusion

The notebook has shown the possibilities of estimating heterogeneous treatment effects while ensure causality, and nonlinearity.

7 References

7.1 Journal papers

Athey, S., and Wager, S. 2021. “Policy learning with observational data.” Econometrica 89(1): 133-161. Doi: https://doi.org/10.3982/ECTA15732.

Wager, S., and Athey, S. 2018. “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests”. Journal of the American Statistical Association 113(523): 1228-1242. Doi: https://doi.org/10.1080/01621459.2017.1319839.

Zhou, Z., Athey, S., Wager, S. 2022. “Offline multi-action policy learning: Generalization and optimization.” Operations Research 71 (1): 148-183. Doi: https://doi.org/10.1287/opre.2022.2271.

7.2 Books

Banerjee, S., Carlin, B.P., Gelfand, A.E. 2015. “Hierarchical modeling and analysis for spatial data”. 2nd Edition. Chapman & Hall/CRC Monographs on Statistics and Applied Probability.

Congdon, P.D. Bayesian Hierarchical Models With Applications Using R, Second Edition.

Moraga, P.2023.Spatial Statistics for Data Science: Theory and Practice with R. Url: https://www.paulamoraga.com/book-spatial/index.html